DOCUBEHT BESUHE 



BD 128 4ia 



TB 005 598 



AUTHOR 
TITLE 

POB DATE 
NOTE 



Sands, Villiam A. 

Alternative Item Response Weighting Procedures: 
Development and Evaluation. 
[Sep 75] 

16p.; Paper presented at the Annual Conference of the 
Hilitary Testing Association (17thr Port Benjamin 
Harrison, Indiana^ September 15-19, 1975); Also 
included in TH 005 585 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



HF-$0.83 HC-$1.67 Plus Postage. 

Admission (School) ; College Ha jors; College Students; 
♦comparative Analysis; Educational Guidance; 
♦Interest Tests; ♦Hilitary Personnel; Occupational 
Guidance; Predictive Ability (Testing); ♦Scoring 
Formulas; Statistical Analysis; ♦Test Interpretation; 
Test Validity; Vocational Interests; ♦Weighted 
Scores 

Naval Academy; ♦Strong Vocational Interest Blank 



ABSTRACT 

In order to develop tools for use in the selection 
and vocational-educational guidance of U.S. Naval Academy midshipmen^ 
three empirically-abased scales, designed using the Strong Vocational 
Interest Blank (SVIB) , were developed to predict three criteria: (1) 
disenrollment for academic reasons, (2) disenrollment for 
motivational reasons, and (3) military aptitude. The Naval Academy 
classes of 1971, 1972^ and 1973 took the SVIB^ and an empirical 
criterion keying approach was used to select those items having the 
75 best responses for each of four different academic major 
groupings. Twenty alternative item response weighting methods were 
evaluated. For each of the four problems, a number of different 
response weighting methods had essentially the same effectiveness. A 
parsimonious conclusion would suggest the continued use of the common 
procedure of assigning positive or negative unit weights to the 
responses. However, scale test-retest reliability and scoring costs 
are two pertinent factors which should be included in an overall 
evaluation of alternative item response weighting procedures for a 
particular application. (BR) 
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BACKGROUND AND PUPvPOSE 



The investigation described herein was accomplished under a program 
at the Navy Personnel Research and Development Center, San Diego, CA. 
The major purpose of this research program is the development of tools 
for use in selection and vocational-^educational guidance for the U. S. 
Naval Academy midshipmen. 

A rather substantial amount of research has been done on interests 
and their relationship to various criteria (Campbell, 1966). Using the 
Strong Vocational I nterest Blank (SVIB), Abrahams and Neumann (1973) 
developed empirically-based SVIB scales designed to predict three criteria 
for the U. S. Naval Academy: (]) disenrollmenc for academic reasons; 
(2) disenrollment for motivational reasons; and (3) military aptitude. 
All three new scales evidenced significant relationships with their 
respective criteria in cross-validation samples. 

The Naval Academy "Maic/rt?. Program" was initiated in 1969 and sub- 
sequently revised. At present, academic majors are organized under three 
broad a-.eas (I) Engineer ixtg-Weapfj^ns; (II) Mathematics-Science; and (III) 
Humanities. 

Recently, there has been a marked empnasis on the importance of the 
technical majors included in Groups I and II. The 1974 edition of the 
Majors Program , published by the Academy, outlines current policy: 

The Naval Academy policy on the selection of majors is 
clear. Each midshipman selects a major which will 
meet the needs of the Navy and at the same time be 
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Interesting to him. The needs of the Navy take first 
priority and it has been determined that eighty percent of 
the Class shall take a technical major, i»e, Group I or 
II, and twenty percent may choose Group III* Hopefully, 
the selection of majors by the Class of 1978 will meet 
the 80/20 quota. If the desired distribution is not 
obtained by an open, free selection process, steps 
will be taken to adjust the distribution to meet the 
Navy goals. (U.S. Naval Academy, p* 9) 

Pursuant to this emphasis on technical majors, Neumnn and Abrahams 
(1974) developed a SVIB scale (E-S) designed to identify Naval Academy 
applicants with engineering and science interests. The class of 1973 was 
split into a key-development sample and a cross-validation sample. The 
criterion employed was dichotomous: Engineering-Science major versus 
other" major. A biserial validity of .57 was obtained in the cross- 
validation sample. Application of the new SVIB siale to the class of 
1976 yielded a biserial validity of .62. The results of this investi- 
gation were also reported in a paper presented at an Air Force symposium 
(Abrahams & Neumann, 1974). 

The Engineering-Science scale (E-S) was developed for use ii: the 
selection of students from the pool of Academy applicants. The current 
operational selection composite involves a number of different predictors. 
The relationship between the E-S scale and the current predictors was 
examined and the validity of alternative composites was e^J'aluatec against 
various criteria; e.g., cumulative grade point average, major choice, 
etc. fo facilitate utilisation of the research findings, results have 
been forwarded by letter to the Dean of Admissions of the Naval Acade:r;y 
(Neumann, 1975). Both the Disenrollment scale and the Engineering-Science 
scales T ^.11 be employed in computing the candidate multiple for applicants 
for the class of 1980 (McKee, 1975). 



PROCEDURE 

Instrument 

The 1966 edition of the Strong ; Vocational Interest Blank (SVIB) for 
men contains 399 items dealing with occupational activities, school 
subjects, etc. A person taking the SVIB is asked to endorse one of three 
response alternatives for most of the items: "Like," "Indifferent." or 
"Dislike." ' 

Samples 

The U. S. Naval Academy classes of 1971, 1972, and 1973 took the 
SVIB during their respective plebe summers. The sample data were edited 
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to remove persons who failed to graduate and persons who graduated with 
a major in the management scieiace area. The number of persons in each 
specific major for eacl. class is shown in Table 1. 

Criteria 

Graduates in the specific majors were aggregated into broad areas > 
as shown in the bottom of Table 1» Four separate problems were addressed: 
(1) differentiation Qf Engineering-Weapons majors (Group I) from all other 
(Groups II + III); (2) differentiation of Mathematics-Science majors 
(Group II) from all others (Groups I + III); (3) differentiation of 
Humanities majors (Group III) from all others (Groups I + II); and (4) 
differentiation of Engineering-Weapons majors (Group I) from Mathematics- 
Science majors (Group II). Previous resi^arch (Sands & McCullah, 1974) 
has indicatvid that separating Group I persons from Group II persons on 
the basis of their SVIB responses is more difficult than differentiating 
Group III majors f 'om all others. 

Item Selection 

An "empirical criterion keying" approach was used to select those 
SVIB items having the 75 best res;jonses for each of the four problems 
addressed. The proportion of high criterion group members wh^. endorsed 
each of the response alternatives for each of the items was computed • 
The same proportion was computed for the low criterion group. Then the 
absolute difference between these two endorsement rates was computed. 
The items containing the 75 responses exh:fbiting the greatest absolute"^ 
differences between endorsement rates were selected for subsequent 
weighting. 

Response Weighting 

Twenty alternative item response weighting methods were evaluated. 
In each method, weights were assigned so that high scpres would be 
associated with the high criterion group, while the low criterion group 
would tend to receive lov7er scores. 

Method #1. Unit weights were assigned to the 75 responses with the 
greatest absolute difference in endorsement rates. For those responses 
endorsed by a greater proportion of high criterion group members than by 
low criterion group members, a positive unit weight was assigned. Con- 
versely, a negative unit weight was given to responses endorsed by a 
greater proportion of low criterion group members. Responses which were 
cot among the best 75 received weights of zero. 

Method #2. Those items having one or more responses receiving a 
unit weight under the first method were dimensionalized, as suggested 
by Campbell (1971). Each such item is considered as a continuum ranging 
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TABLE 1 



Number of Naval Academy Graduates by Individual Major 
for the Classes of 1971, 1972 and 1973 



Number of Graduates 
Individual Major 1971 1972 1973 Total 



Group 1: Engineering-Weapons 



Aerospace Engineering 




76 


51 


38 


165 


Electrical Eneineerine" 








4,4, 


00 


General Engineering 




0 


0 


16 


16 


Marine Engineering 




10 


13 


4 


27 


Mechanical Engineering 




66 


41 


37 


144 


Naval Architecture 




15 


6 


10 


31 


Ocean Engineering 




13 


18 


32 


63 


Systems Engineering 




23 


16 


12 


51 


3T"Ollti TT* Mi» t^li omn t* 4 r» e^Cr^-f ar» 












Applied Science 




12 


19 


18 


49 


Chemistry 




12 


14 


35 


61 


Mathematics 




85 


58 


66 


209 


Oceanography 




95 


122 


103 


320 


OoPTfl^^nnQ An^^^vc^e 
c JL a ux^iio AiiaXjr o xo 








63 


145 


Physical Science 




0 


0 


1 


1 


Physics 




29 


43 


28 


100 


Jroup III: Humanities 












American Political Systems 




25 


49 


48 


122 


International Security Affairs 




53 


54 


56 


163 


European Studies - French 




5 


4 


7 


16 


Eurc^>ean Studies - German 




15 


7 




OK 
4D 


European Studies - Italian 




2 


1 


0 


3 


Latin American Studies - Spanish 




15 


12 


7 


34 


Latin American Studies - Portuguese 




0 


0 


2 


2 


Far Eastern Studies - Chinese 




0 


3 


1 


4 


Soviet Studies - Russian 




2 


2 


3 


7 


Economics 




23 


1.2 


11 


46 


English 




12 


6 


10 


28 


History 




19 


20 


30 


69 


1971 . 1972 


1973 


Total 




Summary ^ % N 


% 


N 


% 


2i 


% 


Group I 225 34 169 


26 


171 


26 


565 


29 


Group II 264 40 307 


48 


314 


47 


885 


45 


Group III 171 26 170 


26 


178 


27 


519 


26 



735 

5 

o 

ERIC 



from "Like" at one extreme to ^'Dislike" at the opposite end. If one 
end of the continuum received a unit weight under the first weighting 
procedure, that response receives the same weight under this method 
and the opposite end of the response continuum Is assigned a unit weight 
affixed with the opposite sign. If "Indifferent" was the only response 
for an Item which received a unit welo;ht under the first weighting pro- 
cedure, the icem obviously does not have the assumed underlying con- 
tinuum and, therefore, all responses for the item are assigned a weight 
of zero under this second weighting method. 

Method if 3 . Multiple weights were chosen after examination of the 
distribution of absolute differences betweer endorsement rates for the 
best 75 responses. These weights (0, +1, +2, j;;;3, +4 and +5; were 
assigned to responses according to the degree to which the two criterion 
groups differed. Again, positive weights were attached to responses 
endorsed by a greater proportion of the high criterion group than the. 
low criterion group. Ne^^ jtlve multiple we ' ' s were assigned to 
responses endorsed by a greater proportion c low than high criterion 
group members. The positive and negative multiple weights were 
assigned only to those responses receiving a unit weight under the first 
method. Those responses receiving a zero weight under the first method 
also receive a zero weight under this third method. 

Method j?4. The fourth weighting method is a dlmenslonallzed 
version of the third method. The assumption and procedure employed in 
dimensionalizing was explained under the second weighting method 
discussed above. 

Method //5 . Examination of the results of other investigations 
suggested another set of multiple weights. These weights (0, +3, +4, 
+5, and +6) were assigned using esseiitially the same procedure as 
explali.ed above for the third method. Again, weights were assigned 
only to those responses which received a unit weight under the first 
method. A weight of zero was assigned to those responses having zero 
weights under the first method. 

Method £6. The sixth response weighting method is simply a 
dlmenslonallzed version of the fifth method. 

Method //7 . This response weighting method assigns weights based 
upon the endorsement rate for the high criterion group, regardless of 
the low criterion group endorsement rate. 

Method //8 . The eighth method uses weights based upon the 
differences between the endorsement rates for tht. high and low criterion 
groups. 

Method //9 . Like the previous response weighting method, this ninth 
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method employs the difference between the endorsement rates of the high 
and low criterion groups. Under the present method, this difference is 
squared and affixed with the sign of the ncnsquared difference. 

Method £10. The tenth response weighting method utilized Bayes' 
Theorem to estimate the probability that a person belongs to the high 
criterion group, given that he endorsed a particular response alter- 
native. The desired posterior probability is a function of the condi-' 
tional probabilities of endorsing the response, given membership in each 
of the criterion groups and the prior probabilities of beloi>ging to each 
of the criterion groups. 

Method £11. The posterior probability of being a member of the 
high criterion group, computed under the tenth method, may be considered 
as a proportion. Specifically, of all the persons who endorsed a parti- 
cular alternative of an item, a certain proportion belong to the high 
criterion group. Each proportion has a standar^: error which is a 
function of the proportion itself and the sample size. This eleventh 
method involves inversely weighting thq proportion by .the standard error 
of the proportion. 

The standard error of a proportion is influenced by the pioportion 
itself and the sample size for the response. Specifically, the inverse 
of a standard error of a proportion is smallest when the proportion is 
0.5 and becomes progressively larger as the proportion approaches zero 
or unity. This means that, for a fixed sample size, extreme proportions 
are weighted by a higher factor. The product of this factor (the 
inverse of the standard error of a proportion) and the proportion itself 
yields a very large weight for high proportions in comparison to thr 
weight for low proportions, for a fixed pimple size. 

On the other hand, for a fixed proportion, a larger weight is 
assigned to those responses endorsed by a large number of persons than 
is given to responses made by a small nuTr:^er of persons. This char- 
acteristic cf the eleventh weighting method reflects the belief that a 
proportion computed in a large sample shouJJ be more s. table than would 
be the case for a small sample and, therefore, should receiva a higher 
weight. 

M^^^Qd ii2. The proportion considered in the previous two methods 
was weighted by a factor consisting of the ordinate on the uni^ normal 
curve divided by the standard error of the ptoportion. The weight 
assigned to a response under this twelfth method is influenced by the 
proportion and the sample size. As was true for the eleventh method, 
responses endorsed by a large number or persons receive more weight 
than responses based upon a small sample, when the proportion is held 
constant. The eleventh method modifies the original proportion by 
assigning larger weights to extreme proportions for a fixed sample size. 
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This twelfth method reverses this strategy. The original proportion Is 
modified by assigning larger weights to proportions near 0.5 and pro- 
gesslvely smaller weights to extreme proportions near zero or unity. 
The original proportion is weighted most heavily when the response 
distribution is equally divided by the two criterion groups. 

Method //13 . mentioned above, the posterior probability of being 
a member of the high criterion group, given the endorsement of a 
particular response^ is a proportion. This thirteenth method divides 
this proportion by its complement and performs a natural logarithmic 
transformation on the rasult. 

Method ^14^. A two-by-two contingency table was constructed for 
each re6}>onse alternative for each item evaluated. The rows of the 
tables represent the two criterion groups (high and low) while the 
columns represent a dichotomized response (absence versus presence) . 

A phi coefficient was computed for each response of each item 
evaluated. This coefficient is used for the fourteenth response 
weighting ir>thod. 

Method #15 > The coefficient of determination for a validity 
coefficient is the square of the correlation. This coefficient of 
detennlnatlon represents the proportion of the criterion variance which 
is explained by the predictor variable. The phi coefficient (the 
validity for the dichotomized response, two criterion group problem) 
was squared and affixed with the sign of the nonsquared difference. 

Method #16. The sixteenth nethod uses the phi coefficient 
computed for the fourteenth method and Inversely weights the coefficient 
by the standard error of the phi coefficient. 

Method #17 . The magnitude of any phi coefficient is constrained 
by the marginal proportions of the two-by-two contingency table. In an 
attempt to eliminate this constraining influence on the response 
weights, the seventeenth method uses the ratio of the obtained phi 
coefficient to the maximum phi coefficient possible under the given 
(?it)nditlons. 

Metho d tfl8 . The eighteenth response weighting method employs 
Fisher's 2^ coefficient, a transfrrmation of the correlation coefficient. 
For two predictor categoric^s (presence versus absence of the response) 
and two criterion categories (high versus low), the phi coefficient is 
the appr«)priate correlation. 

Method #19 . The nineteenth method uses the square of the Fisher's 
Z coefficient computed for the previous method. This squared 
coefficient is affixed with the sign of the original Z coefficient. 
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Method if 20 . The last method weights the Fisher's Z coefficient, 
computed for the eighteenth method, by the Inverse of the standard error 
of the coefficient. 

Scale Evaluation 

The SVIB Item responses for each member of the Classes of 1971, 
1972, and 1973 were scored for each of the eighty scales for all mid- 
shipmen. Means and standard deviations were computed separtely for the 
high and low criterion groups for all scales. Point-blserlal validity 
coefficients v/ere computed separately for each year group, for each of 
the eighty scales. Finally, for each separate scale, two validities 
obtained in the Class of 1971 and the Class of 1972 were averaged. Each 
validity coefficient was transformed into a Fisher's Z coefficient, 
weighted by the appropriate degrees of freedom, averaged, and then con- 
verted back to a correlation. This weighted average cross-validity was 
used to assess the ef f ectiv^eness of each of the twenty alternative item 
response weighting procedures. The polnt-blserial validities for the 
Class of 1973 were not used to evaluate the twenty methods, as the scales 
were constructed on this group. 



RESULTS 

Engineering-Weapons Ma j ors Versus Other Majors 

Table 2 show3 the polnt-blserial validity coefficients for each of 
the twenty alternative scales for each of the three classes. Tuese 
scales are designed to differentiate persons with a major in the 
Engineering-Weapons area from persons selecting a major in either of tlje 
other tvo broad academic areas. 

The last column in Table 2 presents a weighted average cross-validity 
for each of the twenty scales. This valid .Ity is based upon the Classes 
of 1971 and 1972. Scales E16 and E20 demonstrated the highest average 
validity. A number of other scales showed closely similar validities. 
Specifically, eleven of the twenty scales had weighted average cross- 
validities within .002 of the highest average validity. 

Mathematics-Science Majors Versus Other Majors 

Results for the scales designed to differentiate between persons 
majoring in the Mathematics-Science area and persons majoring in another 
area are shown in Table 3. The highest weighted average validity of the 
twenty scales was obtained by the M04 scale. The M17 and M19 scales 
demonstrated weighted average cross-validities within .002 of the highest 
one. 
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Effectiveness of Twenty Engineering-Science Scales 



Polnt-Blserlal Validity Coefficients 

Key Development Cross-Validation Samples Weighted Average 
Scale Sample-1973 1971 1972 Cross-Validity 



EOl 


.419 


,379 


.299 


.340 


E02 


.397 


.386 


.307 


.348 


E03 


.416 


.379 


.304 


.342 


E04 


.396 


.384 


.310 


.348 


EOS 


.419 


.379 


.301 


.341 


E06 


.399 


.385 


.308 


.347 


F07 


.275 


.265 


.245 


.255 


EOS 


.415 


.380 


.309 


.345 


E09 


.410 


.378 


.309 


.344 


ElO 


.419 


.382 


.310 


.347 


£.11 


.304 


.299 


.265 


.282 


E12 


.352 


.343 


.296 


.320 


E13 


.364 


.3«S 


. .298 


.322 


E14 


.416 


.384 


.311 


.348 


E15 


.413 


.383 


.309 


.347 


E16 


.417 


.384 


.312 


.349 


E17 


.408 


.381 


.312 


.347 


E18 


.416 


.384 


.311 


.348 


E19 


.413 


.384 


.311 


.348 


E20 


.417 


.384 


.312 


.349 
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Effectiveness of Twenty Mathematics-Science Scales 



P oint-Biserlal Validity Coefficients 

Key Development Cross-Valldatlon 'Samples Weighted Average 
Scale Sainple-.1973 1971 1972 Cross-Validity 



MOl 


.399 


.180 


.284 


.232 


M02 


.354 


.177 


.290 


.234 


M03 


.393 


.183 


.293 


.238 


M04 


.351 


.186 


.298 


.242 


M05 


.399 


.180 


.284 


.232 


M06 


.354 


.177 


o290 


.234 


M07 


.247 


.134 


.212 


.173 


M08 


.396 


.173 


.294 


.234 


M09 


.389 


.174 


.299 


.237 


MIO 


.392 


.167 


.261 


.214 


Mil 


.293 


.154 


.229 


.191 


M12 


.296 


.152 


.230 


.191 


M12 


.286 


.158 


.257 


.207 


M14 


.392 


.175 


.299 


.23 V 


M15 


.384 


.175 


.302 


.239 


M16 


.395 


.175 


.298 


.237 


M17 


.370 


,176 


.305 


.241 


M18 


.392 


.175 


.299 


.237 


M19 


.384 


.177 


.302 


.240 


M20 


.395 


.175 


.298 


.237 
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HumanltlfeF Majors Versus O ther Majors 



Table 4 presents the polnt-blserlal validities for each class on 
each of the twenty alternative scales designed to differentiate persons 
majoring In the area of Humanities from persons majoring In the other 
two broad academic areas. The highest weighted cross~-valldlty was 
obtained by the H15 and H19 scales. Thrae other scales, H17, H18 and 
H20, demonstrated weighted average cross-validities within .002 of the 
highest one. 

Engineering-Science Majors Versus Mathematics-Science Majors 

^ The last of the four problems addressed in this study was to 
differentiate persons majoring in the ^ingineerlng-Science area from 
persons majoring in t'ueMathematlcs-Sclenca area. The polnt-blserlal 
validity coefficients for the twenty alternative Item response weighting 
strategies for each of the three classes are presented in Table 5, The 
T17 scale showed the highest average cross-validity. The average 
validities obtained for the T04 and T06 scales were within .002 of the 
highest scale. 



TTSCUSSION AND CONCLUSIONS 
« 

Most Effective Item Response Weighting Methods 

The most striking characteristic of the twenty alternative item 
response weighting methods Is their general similarity in terms of 
effectiveness. For each of the four problems addressed in thl& study, 
a number of different response weighting methods have essentially the 
same ability to dif f erentiiili'j betweeu the high and low criterion groups. 
A parsimonous conclusion would suggest the continued use of the common 
procedure of assigning positive or negative unit weights to the respon^^es 
l.e«t the first method. 

Least Effective Item Response Weighting Methods 

Unlike the situation for the most effect^ e method where a number 
of techniques are essentially equivalent » tb' ;e are two methods which 
were consistently the least effective. Met'.od #7 ranks as the worst 
method of all twenty methods. This finding holds true across all :'ur 
problems. This method employs weights based upon the endorsement rate 
of the high criterion group, and ignores the low criterion group endorse- 
ment rate. 

Method #11 was the second least effective method in each of the 
four problems. This method weighted the posterior probability of high 
criterion group membership by the inverse of the standard error* 
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Effectiveness of Twenty Humanities Scales 



Polnt-Blserlal Validity Coefficients 

Key Development Cross-Valldatlon Samples Weighted Average 
Scale^ Saraple-1973 1971 1972 Cross-Validity 



HOI 


.582 


.535 


.549 


.542 


H02 


.575 


.528 


.543 


.535 


H03 


.585 


.534 


.556 


.545 


HC4 


.581 


.528 


.549 


.538 


IvI05 


.586 


.534 


.556 


.545 


H06 


.581 


.530 


.550 


.540 


•107 


.258 


.271 


.149 


.211 


H08 


.581 


.535 


.554 


.544 


H09 


.585 


.532 


.557 


.544 


HIO 


.591 


.531 


.565 


.548 


Hll 


.422 


.391 


.306 


.350 


H12 


.505 


.464 


.43 9 


.442 


H13 


.584 


.52** 


.546 


.534 


H14 


.587 


.537 


.562 


.549 


H15 


.590 


.532 


.572 


.552 


H16 


.587 


.536 


.562 


.549 


H17 


.585 


.528 


.572 


.550 


H18 


.586 


.537 


.563 


.550 


H19 


.590 


.531 


.572 


.552 


H20 


.587 


.537 


.563 


.550 
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TABLE 5 

Effectiveness of Twenty Technical Scales 



Polnt-Blserlal Validity Coefficients 

Key Development Cross-Validation Samples Weighted Average 

Scale Sample-.1973 1971 1972 Cross-Validity 

— • 



TOl 


.467 


214 


. 




T02 


.391 




. / 


o on 


T03 


.473 








T04 


.403 




91 


Oil 


T05 


.467 


.214 






T06 


.391 




• zuo 


.231 


T07 


.212 


.159 


.100 


.130 


T08 


.476 


.215 


.203 


.209 


T09 


.476 


.219 


.199 


.209 


TIO 


.481 


.223 


.211 


.217 


Til 


.275 


.183 


.141 


.162 


T12 


.329 


.211 


.157 


.184 


T13 


.464 


.2]? 


.205 


.212 


T14 


.483 


.221 


.208 


.215 


T15 


.488 


.224 


.210 


.217 


T16 


.484 


.222 


.204 


.213 


.T17 


.482 


.244 


.222 


.233 


T18 


.483 


.222 


.209 


.216 


T19 


.488 


.224 


.210 


.217 


T20 


.484 


.222 


.204 


.213 
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Genera 1 Iza t Ion of the Findings 

This study was comprehensive In that twenty Item response weighting 
procedures were ex^^ralned In four separate problems, using three separate 
samples. However, the extent to which the findings can be generalized 
Is limited by two considerations. The first limitation is that the 
original "best" 75 responses were selected on the basis of the greatest 
absolute dlfferencr»s in endorsement rates between the high and low 
criterion groups. C';;her procedures for selecting items to be keyed could 
produce results different from those reported herein. 

A move Important limitation is that the number of responses 
originally keyed was not systematically varied. For each of the four 
problems, those items containing the liest 75 responses were chosen for 
keying. It is expected that one or jL few of the more sophisticated 
differential weighting methods might evidence a distinct advantage over 
the simple unit weighting method if the number of keyed responses was 
decreased. Conversely, as key length Increases, unit x^elgh'tlng may 
evidence narked advantages over the more mathematically complex methods. 

Finally, the "best" method should not be determined solely on the 
basis of a validity coefficient. Scale test-retest reliability and 
scoring costs are two pertinent factors which should be Included in an 
overall evaluation of alternative item response weighting procedures 
for a particular application. 
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